arXiv:1509.00394v2 [stat.CO] 28 Jun 2016 


Variance estimation in the particle filter 


Anthony Lee and Nick Whiteley 
University of Warwick and University of Bristol 

June 29, 2016 


Abstract 

This paper concerns numerical assessment of Monte Carlo error in particle filters. We show 
that by keeping track of certain key features of the genealogical structure arising from resampling 
operations, it is possible to estimate variances of a number of standard Monte Carlo approximations 
which particle filters deliver. All our estimators can be computed from a single run of a particle 
filter with no further simulation. We establish that as the number of particles grows, our estimators 
are weakly consistent for asymptotic variances of the Monte Carlo approximations and some of 
them are also non-asymptotically unbiased. The asymptotic variances can be decomposed into 
terms corresponding to each time step of the algorithm, and we show how to consistently estimate 
each of these terms. When the number of particles may vary over time, this allows approximation 
of the asymptotically optimal allocation of particle numbers. 


1 Introduction 

Particle filters, or sequential Monte Carlo methods, provide Monte Carlo approximations of integrals 
with respect to sequences of measures. In popular statistical inference applications, these measures 
arise naturally from conditional distributions in hidden Markov models, or are constructed artifi¬ 
cially to bridge between target distributions in Bayesian analysis. The numbers of particles used 
to perform the approximation controls the tradeoff between computational complexity and accuracy. 
Theoretical properties of this relationship have been the subject of intensive research; the literature 
includes a number of central limit theorems [Del Moral and Guionnet, 1999, Chopin, 2004, Kiinsch, 
2005, Douc and Moulines, 2008] and a variety of refined asymptotic [Done et ah, 2005, Del Moral et ah, 
2007] and non-asymptotic [Del Moral and Miclo, 2001, Cerou et ah, 2011[ results. These studies pro¬ 
vide a wealth of insight into the mathematical behaviour of particle filter approximations and validate 
them theoretically, but considerably less is known about how, in practice, to extract information from 
a realization of a single particle filter in order to report numerical measures of Monte Carlo error. 
This is in notable contrast to other families of Monte Carlo techniques, especially Markov chain Monte 
Carlo, for which an extensive literature on variance estimation exists. Our main aim is to address this 
gap. 

We introduce particle filters via a framework of Feynman-Kac models [Del Moral, 2004[. This 
approach allows us to identify the key generic ingredients defining particle filters and the measures they 
approximate, and emphasizes that our variance estimators can be used across application areas. Based 
on a single realization of a particle filter, we provide unbiased estimators of the variance and individual 
asymptotic variance terms for a class of unnormalized particle approximations. No estimators of these 
quantities based on a single run of a particle filter have previously appeared in the literature. Upon 
suitable rescaling, we establish that our estimators are weakly consistent for asymptotic variances 
associated with a larger class of particle approximations. One of these re-scaled estimators is closely 
related to that proposed by Chan and Lai [2013[, which is the only other consistent asymptotic variance 
estimator based on a single realization of a particle filter in the literature. We also demonstrate how 
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one can use the estimators to inform the choice of algorithm parameters in an attempt to improve 
performance. 


2 Particle filters 

2.1 Notation and conventions 

For a generic measurable space (E,f), we denote by L{£) the set of R-valued, f-measurable and 
bounded functions on E. For ip G £{£)) p a measure and K an integral kernel on (E, f), we write p,{ip) = 
f^<p(x)fi(dx), K(<p)(x) = f^K{x,dx')(p{x') and pK{A) = J^fj.{dx)K{x,A). Constant functions x G 
i? I—>■ c G R are denoted simply by c. For (p G C{£), p®‘^{x,x') = p{x)p{x'). The Dirac measure 
located at x is denoted 5x- For any sequence (a„)„gz and p < q, ap-.q = {up,... ,aq). For any 
m G N, [m] = {1,..., m}. For a vector of positive values (oi,..., am), we denote by C(ai,..., am) 
the Categorical distribution over m} with probabilities (oi/ X]™ i ■ ■ ■, Om/ When 

a random variable is indexed by a superscript N, a sequence of such random variables is implicitly 
defined by considering each value N G N, and limits will always be taken along this sequence. 


2.2 Discrete time Feynman—Kac models 

On a measurable space (X, A) with n a non-negative integer, let Mq be a probability measure, 
Ml,..., M„ a sequence of Markov kernels and Go,..., G„ a sequence of R-valued, strictly positive, 
upper-bounded functions. We assume throughout that X does not consist of a single point. We define 
a sequence of measures by 70 = Mq and, recursively, 

7p(>S') = [ 7p-i(da;)Gp_i(x)Mp(a;, S'), p G [n], S&X. (1) 

Jx 

Since 7 p(X) G (0, 00 ) for each p, the following probability measures are well-defined: 


VpiS) 


IpjS) 

7p(X)’ 


p G {0,..., n}, S £ X. 


( 2 ) 


The representation 


n—1 


Jr.(.p)=EU{Xr.)Y[GpiXp)'>, 

[ p=o } 


(3) 


where the expectation is taken with respect to the Markov chain with initial distribution Xq ~ Mq 
and transitions Xp ^ Mp{Xp-i, ■), establishes the connection to Feynman-Kac formulae. Measures 
with the structure in (l)-( 2 ) arise in a variety of statistical contexts. 


2.3 Motivating examples of Feynman-Kac models 

As a first example, consider a hidden Markov model: a bivariate Markov chain (A'p, yp)p=o,...,n where 
(A^p)p=o,...,n is itself Markov with initial distribution Mq and transitions Xp ^ Mp{Xp-i, •), and such 
that each Yp is conditionally independent of {Xq, Yq]q^p) given Xp. If the conditional distribution of 
Yp given Xp admits a density gp{Xp, •) and one fixes a sequence of observed values yo,, y-n-i, then 
with Gp{xp) = gp{xp,yp), pn is the conditional distribution of Xn given yo,... ,yn-i- Hence, pn{p) is 
a conditional expectation and 7 n(X) = 7rt(l) is the marginal likelihood of yo,..., y„_i 

As a second example, consider the following sequential simulation setup. Let ttq and tti be two 
probability measures on (X, A) such that 7ro(dx) = T^o{x)dx/Zo and TTi{dx) = T:i{x)dx/Zi, where 
TTo and 7fi are unnormalized probability densities with respect to a common dominating measure dx 
and Zi = f^7ri(x)dx, i G {0,1} are integrals unavailable in closed form. In Bayesian statistics tti 
may arise as a posterior distribution from which one wishes to sample, e.g. having multiple modes 
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and complicated local covariance structures, ttq is a more benign distribution from which sampling is 
feasible, and calculating ZijZ^ allows assessment of model fit. Introducing a sequence 0 = /3o < • • • < 

/3„ = 1 and taking Gp{x) = Mq = tto and for eachp = 1,... ,n, Mp as a Markov 

kernel invariant with respect to the distribution with density proportional to Tro{xy~^^Tri{x)^f, one 
obtains by elementary manipulations 

7p(S') = ^ [ = tti, 7„(X) = 

^0 Js ^0 

so that ? 7 i,..., forms a sequence of intermediate distributions between ttq and tti. This type of 
construction appears in [Del Moral et ah, 2006] and references therein. 

2.4 Particle approximations 

We now introduce particle approximations of the measures in (l)-(2). Let cg-.n be a sequence of 
positive real numbers and N G N. We define a sequence of particle numbers No-n by Np = [cpiV] 
for p G {0,... ,n}. To avoid notational complications, we shall assume throughout that com and N 
are such that miiipTVp > 2. The particle system consists of a sequence C = Com) where for each p, 

Cp = (Cp) ■ • ■) Cp’’) 8-114 each Q is valued in X. To describe the resampling operation we also introduce 
random variables denoting the indices of the ancestors of each random variable Q. That is, for each 
i G [iVp], is a [iVp_i[-valued random variable and we write 4p_i = (4p_j,..., for p G [n\ 

and A = 4o:„_i. 

A simple algorithmic description of the particle system is given in Algorithm 1. An important 
and non-standard feature here is that we keep track of a collection of Eve indices -Eo:n with Ep = 
{Ep,... jEp’’) for each p, which will be put to use in our variance estimators. We adopt the Eve 
terminology because A* represents the index of the time 0 ancestor of Q . The fact that Np may 
vary with p is also atypical, and allows us to address asymptotically optimal particle allocation in 
Section 6.1. On a first reading, one may wish to assume that No-,n is not time-varying, i.e. Cp = 1 so 
Np = N for all p € {0,..., n}. Figure 1 is a graphical representation of a realization of a small particle 
system. 

Algorithm 1. The particle filter. 

1. At time 0; for each i G [A^o]? sample Q ~ Mq(-) and set Eq g- i. 

2. At each time p = 1,..., n.' for each i G [A^p], 

(a) Sample ^ C {Gp_i(Cpi_i),. •., Gp_i(Cp"7')}- 

(b) Sample Q ~ , •) and set Ep G- E^ff^. 

The particle approximations of rjn and 7 ^ are defined respectively, with the convention n;io<(Gp) = 
1 , by the random measures 


= 

In 


1 


E 


Ofi , 

’n 


7 ^ = 

In 



and we observe that, similar to (2), j]^ = /'y^ (1). To simplify presentation, the dependence of 

and f]^ on co-.n is suppressed from the notation. The following proposition establishes basic properties 
of the particle approximations, which validate their use. 

Proposition 1. There exists a map af : C{X) -G [0, 00 ) such that for any ip G C{X): 
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Figure 1: A particle system with n = 3 and AqiS = (4, 3,3,4). An arrow from (p_i to Q indicates 
that the ancestor of is Cp-ii i-e- ^p-i = *• the realization shown, the ancestral indices are Ag = 
(1,2,4), Ai = (2,1,2) and A 2 = (3, 2,2,3), while the Eve indices are Eq = (1,2, 3,4), Ei = (1,2,4), 

A 2 = ( 2 , 1 , 2 ) and A 3 = ( 2 , 1 , 1 , 2 ). 


4- E {(p)} = 'Yni‘p), for all N >1, 

In {‘p) ln{p) almost surely and Avar { 7 ,^(v 3 )/ 7 „(l)} ^ crl{(p), 


3. r]^{(fi) ->• r]n{ip) almost surely and NE {rj^{ip) - -)• - r]n{p)). 


In the case that the number of particles is constant over time, Np = A, these properties are well 
known and can be deduced, for example, from various results of Del Moral [2004]. The arguments 
used to treat the general Np = [cpA] case are not substantially different, but since they seem not to 
have been published anywhere in exactly the form we need, we include a proof of Proposition 1 in the 
supplement. 


2.5 A variance estimator 

For p g C{X), consider the quantity 


which is readily computable as a byproduct of Algorithm 1 . The following theorem is the first main 
result of the paper. We state it here to make some of the practical implications of our work accessible 
to the reader before entering into more technical details; it shows that via (4), the Eve variables El^ 
can be used to estimate the Monte Carlo errors associated with 7 ,^ {p) and r]^(p). 

Theorem 1. The following hold for any p G E{X), with cr^(-) as in Proposition 1: 

1. E {7^(l)"Kf (‘^)} = var (p)} for all A > 1, 

2. NV^ {p) —>■ cr^(‘^) in probability, 

3. NV^{p} — T]^{p)) —> cr^((/J — r]n{p)) in probability. 

The proof of Theorem 1 , given in the appendix, relies on a number of intermediate results concerning 
moment properties of the particle approximations which we shall develop in the coming sections. Before 
embarking on this development let us discuss how (4) may be interpreted. Consider random variables 
,..., with sample mean X and sample variance 

--i- = --V(A*-A)2. (5) 


E PiCnMCn), ( 4 ) 


/n—l 


Vnip)=Vu (pf - n 


Nn 


Np - 1 
p=0 P , 


NniNp - 1) 
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When are independent and identically distributed it is of course elementary that (5) 

is an unbiased estimator of var(X) and consistency properties are easily established. Observe the 
resemblance between (4) and the left hand side of (5). Some of the features which distinguish these two 
expressions, notably the summation over {(*, j) : ^ and the product term in (4), are a reflection 

of the dependence between the particles and specific distributional characteristics of Algorithm 1. One 
of the main difficulties we face is to develop a suitable mathematical perspective from which to describe 
this dependence and thus establish that (4) does indeed have the properties stated in Theorem 1. 

It seems natural to ask if (4) can be re-written so as to resemble the right hand side of (5) and thus 
be interpreted as some kind of sample variance across the population of particles. This motivates the 
following corollary, using the notation 


41 = card{j : Ei = i}, 


A* = - 

Tr n 


- Vn iv) 


j:Ei=i 


with the convention = 0 when 4n — 0- Recall from Section 2.3 that in the hidden Markov 
model and sequential simulation examples 7 ra(l) is respectively the marginal likelihood and ratio of 
normalizing constants, hence our interest in {(p) with specifically p = 1. 

Corollary 1. In the case that Cp = 1 for all p G {0,..., n}, 

NV^^{l) = ^Y.(*n-^y-n + Opil/N), ( 6 ) 

ie[N] 

- 77^ ip)) = ^ E (7) 

iefAf] 


The proof is in the supplement. 

Since the first term on the right hand side of ( 6 ) can be interpreted as a sample 

variance of the reflecting variation in the numbers of time n descendants across the population 

of time 0 particles. Since = 0, the first term on the right hand side of (7) can be interpreted 

as a sample variance which reflects both variation in the #^’s and the deviations of the familial means 
-i fid) from the population mean 77 ^ {ip). Note that when n = 0, Eq = i always and 




1 

No[No - 1 ) 


iefAfo] 


Vo 


which is in keeping with being independent and identically distributed according to 779 . 


3 Moment properties of the particle approximations 

3.1 Genealogical tracing variables 

Our next step is to introduce some auxiliary random variables associated with the genealogical structure 
of the particle system. These auxiliary variables are introduced only for purposes of analysis: they will 
assist in deriving and justifying our variance estimators. Given (A, (j'), the first collection of variables, 
= (ATg,... is conditionally distributed as follows: Kf is uniformly distributed on [A„] and 

for each p = n — 1,...,0, Given (A, C) and K^, the second collection of variables, 

= {Kq, ... ,K4), is conditionally distributed as follows: is uniformly distributed on [A„] and 

for each p = n — 1,..., 0 we have Kp = if Kpj^^ 4 ^l+i Kp ~ C(Gp(Cp),..., Gp{C,p^)) if 

Kpj^i = Kpj^i- The interpretation of is that it traces backwards in time the ancestral lineage of 
a particle chosen randomly from the population at time n. is slightly more complicated: it traces 
backwards in time a sequence of broken ancestral lineages, where breaks in the lineages occur when 
components of and coincide. 
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3.2 Lack-of-bias and second moment of 7 ^( 93 ) 

We now give expressions for the first two moments of 7 ^ {ip). 

Lemma 1. For any ip G C{X), E | 7 ^= 7 „((/j) and E { 7 ^((/s)} = 7 „((/?). 

The proof is in the supplement. This lack-of-bias property E { 7 ^(v’)} = 7 n(</ 3 ) is quite well known 
and a martingale proof for the Np = N case can be found, for example, in Del Moral [2004, Ch. 9]. 

In order to present an expression for the second moment of 7 ,^ {p ), we now introduce a collection 
of measures on denoted {/if, : b G B^} where Bn = {0,1}"+^ is the set of binary strings of length 
n -I- 1. The measures are constructed as follows. For a given b G Bn, let (Xp, Xpo<p<n be a Markov 
chain with state-space X^, distributed according to the following recipe. If &o = 0 then Xq Mq and 
Xq ~ Mq independently, while if 60 = 1 then Xq = Xq ^ Mq. Then, for p = 1,..., n, if = 0 then 
Xp ^ Mp{Xp-i, •) and Xp ^ Mp{Xp_^, •) independently, while if = 1 then Xp = Xp ^ Mp{Xp-i, •). 
Letting Eb denote expectation with respect to the law of this Markov chain we then define 


/if, (S') = Eb 


n—1 


I{(X„,0 G S} n Gp{Xp)GpiX'p) , 

P—0 


S G b G Bn. 


Similarly to (3) we shall write pb{p) = Eb ^p{Xn, Xn)Y\p^Q Gp{Xp)Gp{Xp)'^, for p G C{X®‘^) and 
b G Bn- 


Remark 1. Observe that with 0„ G Bn denoting the zero string, pQ^ipp®'^) = 7 „(i/ 3 )^. 
Let [IVo:n] = Wo] X • • • X [A^„], and for any b € Bn, 

1(6) = {{k^,k^) G [iVo:„]^ : for each p, kp = kp bp = 1}, 


which is the set of pairs of [A{):n]-valued strings which coincide in their p-th coordinate exactly when 

bp = 1 . 

Lemma 2. For any p G WX®"^) and b G Bn, 


E 


i{(ib\K2) gi(6)}7):^(i)V(c5,c5 




2 \l-6p 


Pb{p) 


and 


E n 

6eB„ p=0 




l — bj, 


( 8 ) 

(9) 


The proof of Lemma 2 is in the supplement and uses an argument involving the law of a doubly 
conditional sequential Monte Carlo algorithm [see also Andrieu et ah, 2016[. The identity (9) was first 
proved by Cerou et al. [201 1[ in the case where Np = N. Our proof technique is different: we obtain 
(9) as a consequence of ( 8 ). The appearance of K^,K^ in ( 8 ) is also central to the justification of our 
variance estimators below. 


3.3 Asymptotic variances 

For each p G (0,..., n}, we denote by Cp G Bn the vector with a 1 in position p and zeros elsewhere. 
As in Remark 1, On denotes the zero string in Bn- The following result builds upon Lemmas 1-2. It 
shows that a particular subset of the measures {/if, : 6 G Bn}, namely /io„ and {/iej, : p = 0,... ,n}, 
appear in the asymptotic variances. 
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Lemma 3. For any ip € C{X), define 


o,n(v^) — 


7n(l)^ 


p G {0 ,...,n}. 


( 10 ) 


Then Nva.r {7^(V2)/7n(l)} ^ Ep=o S ^^^p.n(v2) and 


NE 


{VuiT) - Vnip)}'^ ^ X! S - Vnip))- 

P—0 


( 11 ) 


The proof of Lemma 3 is in the supplement. 

Remark 2. In light of Lemma 3, the map cr^ in Proposition 1 satisfies 

n 

^^^(^5) = T&C{X). (12) 

p—0 

An expression for Vp^n{p) in terms of {Mp,Gp)Q<p<n is obtained by observing that if we define 

Qp (^^p—i 1 — Gp—\ (xp—1 ')Mp{xp —\, dxp), 

and Qn,n = Id, Qp,n = Qp+i ■ ■ ■ Qn for p G {0,... ,n-l}, then pe^ip) = lp{Qp,n{T)‘^)- In combination 
with Remark 1, we obtain 


I ^ lp{I)lp{Qp,n{p)'^) / n2 VpiQp,n{p)‘^) / x2 

VpAt) = — -f^7Tl2- 


7n(l)^ 




(13) 


4 The estimators 


4.1 Particle approximations of each fib 

We now introduce particle approximations of the measures {ph ■ b G i?n}, from which we shall subse¬ 
quently derive the variance estimators. For each b G Bn, and p G £(4’®^) we define 


t^nT) = 


n (^p)' 

_p—0 


Nr, 


Np-l 


l — b„ 


AafE i{{K\K^)Giib)}p{(:A,CnV\Ac ■ ( 14 ) 


Recalling from Section 3.1 that given A and Kn and are conditionally independent and each 
uniformly distributed on [Nn], it follows from (14) that 

In Ay = 7^(1)^!^ Yi ACuMA) 


iJe[Nn 


= 7 ^( 1 )" E ^ H{K\K^)Gi{b)}p{(:A)ACV\A,c 


bGBn 


E n 

lp=0 


Nr 


1 - 


Nr 


l-b„ 




( 15 ) 


mirroring (9). This identity is complemented by the following result. 
Theorem 2. For any b G Bn and p G C{X®‘^), 

1. E[p ^((/?)} = pbA) for all N >1, 
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2 . supjv>iiV£; -pib(v?)}' 


< oo and hence /i^(</5) —^ h'b{‘f) 'in probability. 


The proof of Theorem 2 is in the supplement. Although (14) can be computed in principle from 
the output of Algorithm 1 without the need for any further simulation, the conditional expectation 
in (14) involves a summation over all binary strings in 1(6), so calculating /r^((^®^) in practice may 
be computationally expensive. Fortunately, relatively simple and computationally efficient expressions 
are available for /i^((^®^) in the cases b — On and b = Cp, and those are the only ones required to 
construct our variance estimators. 


4.2 Variance estimators 


Our next objective is to explain how (4) is related to the measures and to introduce another family 
of estimators associated with the individual terms in (12). We need the following technical lemma. 

Lemma 4. The following identity of events holds: ^ En"’'^ = G I(0„)}. 

The proof is in the appendix. Combined with the fact that given (A, C), are independent 

and identically distributed according to the uniform distribution on [Vi], we have 


E 


l{{K\K^)Gl{On))T{C\C-)\AX 


= N, 


-2 


E 

i.r.Ei^El 




(16) 


and therefore we arrive at the following equivalent of (4) , written in terms of , 


v^{t) 




7^(1)^ ■ 


(17) 


Detailed pseudocode for computing Vn {t) in 0{N) time and space upon running Algorithm 1 is 
provided in the supplement. 

Mirroring (10), we now define 


p,n 


(t) = 


7^(1)^ 


p e {0, 


..N 


(p) = 


/ ^ ^p^n 
p—0 


ip)- 


Detailed pseudocode for computing each u^„(i^) and {(p) with time and space complexity in 0{Nn) 
time upon running Algorithm 1 is provided in the supplement. The time complexity is the same as 
that of running Algorithm 1, but the space complexity is larger. Empirically, we have found that 
NVn {p) is very similar to Vn {p) as an estimator of cff{p) when N is large enough that they are both 
accurate, and hence may be preferable due to its reduced space complexity. 

Theorem 3. For any p G E{X), 

E { 7 " {^fv^ni.P>)} = ln{lfvp^n{p) for all N>1, 

^p^nip) np^n{p) and Vp^niP - Vn i.p)) '^P,n(p “ ilnip)), both in probability, 

3. E { 7 ,^(v?)} = 7 n(l)^CT^(i^) for all N > 1 and (p) cr^ip) in. probability. 


5 Estimators for updated measures 

In some applications there is interest in approximating the updated measures: 

% (S) = (^)7n (da;), fln{S) = ^ 

8 


S gx. 









In the hidden Markov model setting described in Section 2.2, e.g., is the conditional distribution of 
Xn given uq, ... ,yn, that is is a filtering distribution, while r]n is a predictive distribution. 

The updated particle approximations are defined by 

7^(5) = [ G„(x)7^(dx), 5 e 

J S In V-*-/ 

and we now define their variance estimators. To facilitate this task, we consider a fixed ip S >C(T’), 
and define ip{x) = Gn{x)ip{x). The following relationships can then be deduced: 7n(v^) = ln{'p)^ 
= rjniip)/T]n{Gn), 7n iv>) = In i<f) and f}^(if) = T]^(if)/ (G„). We define analogues of and 
Vp^n for the updated particle approximations as 


^niv) = Jim Nv&t (:p)/7n(l)} , 

ly —¥oo 


I’pA’f) 


Vp,n{<p) 

??«(G„)2 ’ 


and the proposition below is a counterpart to Proposition 1 and Lemma 3. 


Proposition 2. For any ip G C-{X), 

n 

1. {p) -)• %{p) almost surely and aKp) = ^ c~^Vp^n{F), 

p—0 


2. f]^ {p) fjnip) almost surely and NE {fj^ (p) - ^ a^^ip - f}n{p))- 


The proofs of Proposition 2, and Theorems 4-5 below can be found in the supplement. Proposi¬ 
tion 2 implies the relationship d'‘^{p) = a^{p)/rjniGnY'■ The corresponding estimates of the variance, 
asymptotic variance and the terms therein are now obtained and analogues of Theorems 1 and 3 fol¬ 
low straightforwardly. Below we write the estimators , Vp^ etc. in terms of and Vp^ to 

emphasize that the same algorithms can be used to compute them, just as {p) and fj^ {p) can be 
computed as 'fni’f) and {‘p)/Vn {Gn), respectively. 


Theorem 4. For any p G FIX'), with 




(18) 


F =vaT{j^{p)} for all N >1, 


2. NV^{p) —>■ a-'^{p) in probability, 

3. NV^{p — f}^ ((p)) afi^p — fin{p)) in probability. 


Remark 3. It follows from (4), (17), (18) and simple manipulations that 

IFtVn (.F — Vn (f)) _ ^ — fin (V^) } 

(rip^ol^) ielNo][ Eje[jv„]G'„(Cn) J 

the right hand side of which is, in the case where N is not time-varying, precisely the estimator in 
Equation 2.9 of Chan and Lai [2013]. 

Theorem 5. For any p G F{X), with 

n 

Vp,niF) = Vp^ni.Flhn {Gnf, Vn {f) = C-^^p.niF), 

P—0 

1- E {t'J' (1)2-()^„((^)} = %{Vfvp^n{F) for all N>1, 

l’p,ni‘P) Vp.ni^p) and - Vn (<^)) Vp,ni<P - Vni^p)), hoth in probability, 

3. E {jn {l)^Vn (v^)} = ln{lYCTnl.'-p) for all N > 1 and v))(p) iri probability. 
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6 Use of the estimators to tune the particle filter 

The variance estimators we have proposed can of course be applied directly to report estimates of 
Monte Carlo error alongside particle approximations. Estimates of quantities such as Vp^n{^) may also 
aid algorithm and design. We provide here two simple examples of adaptive methods to illustrate this, 
firstly concerning how to improve performance by allowing particle numbers to vary over time, and 
secondly concerning how to choose particle numbers so as to achieve some user-defined performance 
criterion. To simplify presentation, we focus on performance in estimating (ip), the ideas can easily 
be modified easily to deal instead with {p), (ip) or 77^ (p). 

6.1 Asymptotically optimal allocation 

The following well known result is closely related to Neyman’s optimal allocation in stratified random 
sampling [Tschuprow, 1923, Neyman, 1934]. A short proof using Jensen’s inequality can be found in 
Glasserman [2004, Section 4.3]. 

Lemma 5. Let oq, ..., > 0. The function (cq, ... ,Cn) >—>■ Up=o ^ minimized, subject to the 

constraints min^ Cp > 0 and Up=o Cp = n + 1, at (n + 1)~^ Il®ll2 when Cp oc 

As a consequence, we can in principle minimize by choosing Cp oc rip An approxima¬ 

tion of this optimal allocation can be obtained by the following two-stage procedure. First run a particle 
hlter with Np = N to obtain the estimates Vp^{p) and then define com by Cp = max{(</?),5(A^)}, 
where g is some positive but decreasing function with limvr-s.oo 5(A^) = 0. Then run a second par¬ 
ticle filter with each Np = [cpA^], and report the quantities of interest, e.g., ')^{ip). The function 
g is chosen to ensure that Cp > 0 and that for large N we permit small values of Cp. The quantity 
Up=o obtained from the first run, is an indication of the improvement in 

variance obtained by using the new allocation. 

Approximately optimal allocation has previously been addressed by Bhadra and lonides [2016], 
who introduced a meta-model to approximate the distribution of the Monte Carlo error associated 
with log 7,^(1) in terms of an autoregressive process, the objective function to be minimized then 
being the variance under this meta-model. They provide only empirical evidence for the fit of their 
meta-model, whereas our approach targets the true asymptotic variance directly. 

6.2 An adaptive particle filter 

Monte Carlo errors of particle filter approximations can be sensitive to N, and an adequate value of N 
to achieve a given error may not be known a priori. The following procedure increases N until {p) 
is in a given interval. 

Consider the case where we wish to estimate 7„ (</?). Civen an initial number of particles and 
a threshold J > 0, one can run successive particle filters, doubling the number of particles each time, 
until the associated random variable V.J^^^\p) G [0,(5]. Finally, one runs a final particle hlter with 
particles, and returns the estimate of interest. We provide empirical evidence in Section 7 that 
this procedure can be effective in some applications. 

7 Applications and illustrations 

In this section we demonstrate the empirical performance of the estimators we have proposed in three 
examples. Our numerical results mostly address the accuracy of our estimators of the asymptotic 
variance <j‘^[p), the individual terms Vp^n(T): the effectiveness of the applications described in 

Section 6 with the test functions p = 1 and p = Id, the identity function. Where the results for later 
examples are qualitatively similar to those of the hrst, the corresponding hgures can be found in the 
supplement. 
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Figure 2: Estimated asymptotic variances NV^{(f) (dots and error bars for the mean ± one standard 
deviation from 10^ replicates) against log2 N for the linear Gaussian example. The horizontal lines 
correspond to the true asymptotic variances. The sample variances of 7^(1)/7„(1) and f]^{Id), scaled 
by N, were close to their asymptotic variances. 



Figure 3: Plot of h^ji(l) (dots and error bars for the mean ± one standard deviation from 10^ replicates) 
and Vp^n{^) (crosses) at each p G {0 ,... ,n} for the Linear Gaussian example, with N = 10®. 


7.1 Linear Gaussian hidden Markov model 

This model is specified by Mo (•) = M{-',0,1), Mp{xp-i, ■) = fd'{-;0.9xp-i,l) a,nd Gp{xp) = A/"(j/p; ccp, 1). 
The measures ?)„ and 7„ are available in closed form via the Kalman filter, and for suitable f the 
quantities Vp^n{f) etc. can be computed exactly, allowing us to assess the accuracy of our estimators. 
We used a synthetic dataset, simulated according to the model with n = 99. A Monte Garlo study 
with 10'^ replicates of {f) for each value of N and Cp = 1 was used to measure the accuracy of the 
estimate NV^{f) as N grows; results are displayed in Figure 2 and for this data (T^(1) = 295.206 
and d'^{Id — fjn{Id)) ~ 0.58. The estimates {f) differed very little from NV^{f), and so are not 
shown. We then tested the accuracy of the estimates f}^„(l); results are displayed in Figure 3. The 
estimates Vp^^{Id — fj^{Id)) are very close to 0 for p < 95 and with values (0.0017, 0.012, 0.082,0.48) 
for p G {96, 97,98,99}; this behaviour is in keeping with time-uniform bounds on asymptotic variances 
obtained by Whiteley [2013], see also references therein. 

We also compared a constant N particle filter, the asymptotically optimal particle filter where the 
asymptotically optimal allocation is computed exactly, and its approximation described in Section 6.1 
for different values of N using a Monte Garlo study with 10^ replicates. We took g{N) = 2/log2 N 
in defining the approximation, and the results in Figure 4a indicate that indeed the approximation 
reduces the variance. The improvement is fairly modest for this particular model, and indeed the exact 
asymptotic variances associated with the constant N and asymptotically optimal particle filters differ 
by less than a factor of 2. In contrast. Figure 4b shows that the improvement can be fairly dramatic 
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(a) (b) 

Figure 4: Logarithmic plots of the sample variance across 10^ replicates of 7 ^( 1 )/ 7 „( 1 ) against N 
for the linear Gaussian example, using a constant N particle filter (dotted), the approximation to the 
asymptotically optimal particle filter (dot-dash), and the asymptotically optimal particle filter (solid). 
In Figure 4b, the observation sequence is j/p = 0 for p € {0,..., 99} \ {49} and j /49 = 8 . 



(a) 


(b) 


Figure 5: Logarithmic plots for the simple adaptive N particle filter estimates of 7 n(l) for the linear 
Gaussian example. Figure (a) plots the sample variance of (l)/ 7 n(l) against 6 , with the straight 
line y = X. Figure (b) plots N against S, where N is the average number of particles used by the final 
particle filter. 


in the presence of outlying observations; the improvement in variance there is by a factor of around 
40. Finally, we tested the adaptive particle filter described in Section 6.2 using 10^ replicates for each 
value of (5; results are displayed in Figure 5, and indicate that the estimates of 7 n(l) are close to their 
prescribed thresholds. 

7.2 Stochastic volatility hidden Markov model 

A stochastic volatility model is defined by Mo(-) = A/" { •; 0, cr^/(l — p^)}, Mp{xp-i, •) = Af{ ■; pxp-i, cr^) 
and Gp{xp) = N{yp] 0, /3^ exp(a;p)). We used the pound/dollar daily exchange rates for 100 consecutive 
weekdays ending on 28th June, 1985, a subset of the well-known dataset analyzed in Harvey, Ruiz and 
Shephard (1994). Our results are obtained by choosing the parameters (p, tr,/3) = (0.95,0.25,0.5). 
We provide in the supplement plots of the accuracy of the estimate NV^ {ip) as N grows using 10^ 
replicates for each value of N; the asymptotic variances (t^( 1) and a'^^Id — fin{Id)) are estimated as 
being approximately 354 and 1.31 respectively. In the supplement we plot the estimates of Vp^nip)- We 
found modest improvement for the approximation of the asymptotically optimal particle filter, as one 
could infer from the estimated Vp^nisp)- For the simple adaptive N particle filter, results are provided 
in the supplement, and indicate that the estimates of 7 „( 1 ) are close to their prescribed thresholds. 
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10 

Figure 6: Plot of Vp„{(p) (dots and error bars for the mean ± one standard deviation) at each p G 
{0,... ,n} with fc = 10 iterations (a)-(b) and k = 1 iteration (c)-(d) for each Markov kernel in the 
SMC sampler example and N = 10®. 


7.3 An SMC sampler 

We consider a sequential simulation problem, as described in Section 2.3, with X = M, 7ro(x) = 
^(0,10^) and 7ri(x) = 0.3A/"(x; —10, 0.1^) + 0.7A/’(x; 10, 0.2^). The distribution tti is bi-modal with 
well-separated modes. With n = 11, and the sequence of tempering parameters 

= (0,0.0005,0.001,0.0025,0.005,0.01,0.025,0.05,0.1,0.25,0.5,1), 

we let each Markov kernel Mp, p G {!,...,n} be an r^p-invariant random walk Metropolis kernel 
iterated fc = 10 times with proposal variance , where ti:„ = (10, 9,8 , 7,6, 5,4, 3, 2,1,1). 

One striking difference between the estimates for this model and those for the hidden Markov models 
above is that the asymptotic variance a^{Id— r]n{Id)) ^ 822 is considerably larger than cr^(l) ~ 2.1; 
the variability of the estimates NV^{(p) is shown in the supplement. Inspection of the estimates of 
VpPp) in Figures 6 allows us to investigate both this difference and the dependence of VpPp) on k 
in greater detail. 

In Figure 6(a)-(b) we can see that while Xp_„(l) is small for all p, the values of VpPid — rin{Id)) 
are larger for large p than for small p; this could be due to the inability of the Metropolis kernels 
{Mq)qyp to mix well due to the separation of the modes in (j]q)q>p when p is large. In Figure 6(c)-(d), 
k = 1, that is each Mp consists of only a single iterate of a Metropolis kernel, and we see that the 
values of VpPp) associated with small p are much larger than when k = 10, indicating that the larger 
number of iterates does improve the asymptotic variance of the particle approximations. However, 
the impact on Vp^np) is less pronounced for large p. Results for the simple adaptive N particle filter 
approximating rjn{Id) are provided in the supplement, which again show that the estimates are close 
to their prescribed thresholds. 

8 Discussion 

8.1 Alternatives to the bootstrap particle filter 

In the hidden Markov model examples above, we have constructed the Feynman-Kac measures taking 
Mq ,..., Mn to be the initial distribution and transition probabilities of the latent process and defining 
C?0i • ■ • I Gn to incorporate the realized observations. This is only one, albeit important, way to con¬ 
struct particle approximations of ??„, and the algorithm itself is usually referred to as the bootstrap 
particle filter. Alternative specifications of {Mp, Gp)o<p<n lead to different Feynman-Kac models, as 
discussed in Del Moral [2004, Section 2.4.2], and the variance estimators introduced here are applicable 
to these models as well. 
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Figure 7: Plot of (dots and error bars for the mean ± one standard deviation) and hp,n(l) 

(crosses) at each p G {0,..., n} in the Linear Gaussian example. 


One particular specification corresponds to the “fully adapted” auxiliary particle filter of Pitt and Shephard 
[1999], as discussed by Doucet and Johansen [2008]. Specifically, we define Mo(dxo) = Mo{dxo)Go{xo)/MQ{Go), 
and 


A/p(xp_i, dxp) — 


ATp(xp_ 1, dxp^Gp(xp ) 
Mp{Gp){xp-i) 


P G [n], 


and then (70(2^0) = Mq{Go)Mi{Gi){xq) and Gp{xp) = Mp+i(Gp+i)(xp), p > 1. If we denote by 7„ 
and fjn the Feynman-Kac measures associated with {Mp, Gp)Q<p<ny we obtain 7„ = 7„ and ?)„ = ?)„. 
Moreover, the variances of {(p) and fj^ (ip) are often smaller than the variances of (p) and (p). 
In Figure 7, we plot the corresponding hp_„(l) and their approximations for the same linear Gaussian 
example in Section 7.1. Here, the asymptotic variance of 7,^(1)/7„(1) is 40.718, more than 7 times 
smaller than (7^(1). 


8.2 Estimators based on i.i.d. replicates 

It is clearly possible to estimate consistently the variance of 7)^((p)/7„(l) by using i.i.d. replicates of 7)^. 
Such estimates necessarily entail simulation of multiple particle filters. We now compare the accuracy 
of such estimates with those based on i.i.d. replicates of V^{p). For some p G C{X) and H S N, let 
Inpip) replicates for i G [B], and dehne M = ^7)^j(l). A standard 

estimate of var {7)^(7’)/7n(l)} is obtained by calculating the sample variance of ^{p)-, i G [H]}. 

Noting the lack-of-bias of 7)^(l)^I^ {p), an alternative estimate of var [7)^((p)/7„(l)] can be obtained 
as Both these estimates can be seen as ratios of simple Monte Carlo 

estimates of var {7)^(p)} and 7n(l)^, and are therefore consistent as B ^ 00 . We show in Figure 8 
a comparison between these estimates for the three models discussed in Section 7 with N = 10^ and 
p = 1, and we can see that the alternative estimate based on 1^(1) is empirically more accurate for 
these examples. 


8.3 Final remarks 

The particular approximations developed here provide a natural way to estimate the terms appearing in 
the non-asymptotic second moment expression (9). To the best of our knowledge, we have also provided 
the first generally applicable, consistent estimators of Vp^n{p)- The expression (9) does not apply to 
particle approximations with resampling schemes other than multinomial, and one possible avenue of 
future research is to investigate estimators in these other settings. Whilst we have emphasized variances 
and asymptotic variances, the measures pb also appear in expressions which describe propagation of 
chaos properties of the particle system. For instance, in the situation Np = N, the asymptotic bias 


14 
















0.6 


0.5- 

o 

ro 

~ 0.4 ■ 
to 


■§ 0 . 2 - 


0.1 - 


5 10 20 50 100 5 10 20 50 100 5 10 20 50 100 

B B B 

Figure 8: Plot of the standard estimate of var [7 ^(</j)/ 7„(1)] (gray dots and error bars) and the 
alternative estimate using (1) (black crosses and error bars) against B in (left to right) the examples 
of Sections 7.1-7.3. 



O 0.4- 
ro 
_E 

to 1 

0) 0.3 
0) 
o 
c 

ro , 


- X 


K 


formula of Del Moral et al. [2007, p.7.] can be expressed as 




n — 1 

-E 

p—O 




p,n 


^pQp,n(l) 


(<7’ - ll ni^))} ^ ^ 


{1 (8> ((/? - 77„((/j))} 


p—O 




which could be consistently estimated by replacing fj,e^ and 7n(l) by and (1). Finally, the 
technique used in the proof of Lemma 2 can be generalized to obtain expressions for arbitrary positive 
integer moments of 7,^ {(p). 


Supplementary Material 

The supplementary material at http://www.warwick.ac.uk/alee/vestpf_supp.pdf includes algo¬ 
rithms for efficient computation of the variance estimators, and proofs of Corollary 1, Lemmas 1-3, 
Propositions 1-2, and Theorems 2, 4 and 5. 


Appendix 

Proof of Theorem 1. Throughout the proof, —> denotes convergence in probability. For part 1., the 
fact = 'ynipY and Theorem 2 together give 

E {7„"^(1)VX(7>)} = E = E {7)r(^)^} - Inipf = var {7„^(^)} . 

For part 2., combining the identity (15), /i^(7’®^) —t by Theorem 2, and the fact that for any 

( \ \ ^ 

^ j (1 — is in 0{N~‘^), we obtain 


iSiff - + OAN-% 


(19) 


kp=0 


Also noting that by Proposition 1 7)^(1)^ —t 7„(1)^, from (10) that jnO)‘^Vp,n{T) = /-tep(<7®^) — 
po„(<7®^) and again using —>• /rf,((^®^), we then have 


In V-^/ 


( 20 ) 


1=0 '^P 
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For part 3., first note that by Theorem 2 and Proposition 1, for any b G i?„, 


- Vn - Vn {‘f)[lJ-b ‘P)] + Vn ^^b 

from which it follows that (19) also holds with ip replaced hy p — rj^ (p), and then 

^ yp,n{P - Vn{p)) 


NV^ip-r^‘:ip)) 


P—0 


= ^n{p-Vn{p)), 


similarly to (20). 


□ 


Proof of Lemma f. For i G [A^„] define and i?p_i = for p G [n — 1]. Since in 

Algorithm 1, i?® = E for all p G [n],* G [Np] , a simple inductive argument then shows that 


El =E., 


b; 

'p j 


p G {0,... ,ri}, i G [Af„]. 


( 21 ) 


We shall now prove {K^,K'^) G I(0„) => ^ En"■ Recall from Section 3.1 that when {K^,K^) G 


k: 


I(0„), we have ^ = A^f^ for allp G [n], hence ^ = R,^". Applying 

(21) with p = 0 and using the fact that in Algorithm 1, FIq = i for all i G [A^n], we have El = Eq° = Bq, 


.K. 


hence En'' = B^' 


Q 7^ Bq" = En" as required. It remains to prove {K^,K^) ^ I(0„) => En" = E, 

^Kl 


_ p/^n 
0 — 

Assuming {K^,K‘^) ^ I(0„), consider t = max{p : Kp = Kp}. li t = n then clearly En" = En 
so suppose T < n. It follows from Section 3.1 that Bt"^ = = Kf = Rt ”, so taking p = t and 

i = KlKlm (21) gives Flf" = Flf ”. □ 

Proof of Theorem 3. For part 1., Theorem 2 gives 

E [in = E |p^(<p®^) - p^„((p®^)| = Pep(</5®^) “ M0„(<P®^) = InOf Vp^ri{p) ■ 

For the remainder of the proof, ^ denotes convergence in probability. For part 2., (<P®^) — 

7n(l)^i®p,«((/?) by Theorem 2, and (1)^ “t 7n(l)^ by Proposition 1, so w^„(p) = 
hAA? Vp^n{<p)-, as in the proof of Theorem 1, {[p - ■qli{p)f>A 

Pb{[p - Pn(‘p)]®^) gives vAAp - vA ip)) 'Vp,niP - vAp))- Part 3. follows from parts 1. and 2. □ 
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